A Statistical Approach to Machine Aided Translation of Terminology Banks
نویسندگان
چکیده
"l]fis paper reports on a new statistical approach to machine aided translation of terminology bank. The text in the bank is hyphenated and then dissected into roots of 1 to 3 syllables. Both hyphenation and dissection are done with a set of initial probabilities of syllables and roots. The probabilities are repeatedly revised using an EM algorithm. Alter each iteration of hyphenation or dissectioh, the resulting syllables and roots are counted subsequently to yield more precise estimation of probability. The set of roots rapidly converges to a set of most likely roots. Preliminary experhuents have shown promising results. From a terminology bank of more than 4,000 terms, the algorithm extracts 223 general and chemical roots, of which 91% are actually roots. The algoritlun dissects a word into roots with aromld 86% hit rate. The set of roots and their "hand-translation are then used iu a compositional translation of the terminology bank. One can expect the translation of terminology bank using this approach to be more cost-effective, consistent, and with a better closure.
منابع مشابه
Enhancing Statistical Machine Translation with Bilingual Terminology in a CAT Environment
In this paper, we address the problem of extracting and integrating bilingual terminology into a Statistical Machine Translation (SMT) system for a Computer Aided Translation (CAT) tool scenario. We develop a framework that, taking as input a small amount of parallel in-domain data, gathers domain-specific bilingual terms and injects them in an SMT system to enhance the translation productivity...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملPost-MT Term Swapper: Supplementing a Statistical Machine Translation System with a User Dictionary
A statistical machine translation (SMT) system requires homogeneous training data in order to get domain-sensitive (or context-sensitive) terminology translations. If the data consists of various domains, it is difficult for an SMT system to learn context-sensitive terminology mappings probabilistically. Yet, terminology translation accuracy is an important issue for MT users. This paper explor...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملFrom Statistical Term Extraction to Hybrid Machine Translation
This study presents a new hybrid approach for translation equivalent selection within a transfer-based machine translation system using an intertwined net of traditional linguistic methods together with statistical techniques. Detailed evaluation reveals that the translation quality can be improved substantially in this way.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992